LSI_logo Engineering Release Notice
Component: SAS_FW_Image
Release Date: 02-04-2008
OEM: LSI
Version: SAS_FW_Image_APP-1.11.92-0397_BB-1.00.00.02-0007_BIOS-NT10_WEBBIOS-1.1-27-e_10-Rel_CTRLR-1.01-007B_2008_02_04
Package: 6.0.1-0085
FW_SAS 1.11.92-0397


FW_SAS
Component: FW_SAS
Stream: SAS_1.0_Dev
Version: 1.11.92-0397
Baseline From: FW_SAS_Release_1078-1.11.82-0382_2008_01_22
Baseline To: FW_SAS_Release_1078-1.11.92-0397_2008_02_04.979
CHANGE SUMMARY:
LSID100092956 (TASK) Data corruption running I/O w/ capacity expansion
LSID100081880 (TASK) FW_SAS Release Version: 1.11.92-0397
LSID100092922 (TASK) FW_SAS Release Version: 1.11.82-0396
LSID100092962 (TASK) update maintenance version in version.c
LSID100092957 (DFCT) Data corruption occurs running I/O and performing capacity expansion
DEFECT RECORDS (Total Defects=1, Number Duplicate=0):
FW_SAS DEFECTS
DFCT ID: LSID100092957
Customer DFCT No: DF193182
Headline: Data corruption occurs running I/O and performing capacity expansion
Description: Data corruption occurs running I/O and performing capacity expansion
Version of Bug Reported: 6.0.2-0001
Steps to Reproduce: Please see the OEMSpecific_recreation field.
Resolution: Fixed
Resolution Description: Defect Id: LSID100092804
Issue: Data corruption occurs running I/O and performing capacity expansion.
Analysis: During a reconstruction, Megaraid firmware facilitates online data access to a reconstructing Logical Drive by internally maintaining two Logical Drives – one represents the portion of capacity that has yet to be constructed ( “original LD”), while the other represents the data area that has completed data reorganization/reconstruction (“ghost LD”). When a host request is received during a reconstruction, Megaraid firmware determines which of these two internal LDs the request falls within and accordingly assigns the request to the either the original LD or the ghost LD. Requests straddling both LDs are deferred and rescheduled until the reconstruction point has advanced beyond the straddling host request.

We found that immediately after a reconstruction completes (with Diskerciser is running), Megaraid firmware is processing a certain number of queued host commands that have been assigned to the ghost LD. These commands were received just before the reconstruction completed, were assigned to the ghost LD, then deferred for later execution pending completion of the active reconstruction cycle (a cycle represents the reconstruction of a given set of rows). If the reconstructing cycle happens to be the final cycle (last set of rows), the reconstruction has completed. This triggers the removal of the ghost LD, the reconfiguration of the original LD to match the reconstructed parameters, and the reconfiguration of the entire Megaraid cache to reflect the new, finalized LD configuration. After these steps have completed, Megaraid resumes processing of the host commands that were queued pending the completion of the active (final) reconstruction cycle. The problem occurs when Megaraid processes the queued requests that were assigned to the ghost LD; the ghost LD at that this point no longer exists due to the completion of the reconstruction. These requests are internally allowed to execute on the ghost LD nonetheless; they complete without error because the ghost LD structure in memory is still valid, albeit abandoned. The corruption occurs because the cache buffers utilized in processing these requests will be assigned to the ghost LD, creating potential cache aliases to the original LD; if there is a mix of read and write commands, data in these cache line aliases may become stale relative to data updated on the disk for the write commands.
Fix: Upon completion of a reconstruction, we move any commands pending for the ghost LD queue onto the original LD queue.
Customer Defect Track No: DF193182
Customer List: OEM -- OEM
Fix Impact: Medium
Suggested Testing: Run Diskerciser IO utility with Online OCE (Reconstruction)
Child Tasks: LSID100092956
UCM ACTIVITY / TASK RECORDS (4):
FW_SAS UCM TASKS
Task ID: LSID100092956
Headline: Data corruption running I/O w/ capacity expansion
Description: Issue: Data corruption occurs running I/O and performing capacity expansion.

Analysis: During a reconstruction, Megaraid firmware facilitates online data access to a reconstructing Logical Drive by internally maintaining two Logical Drives – one represents the portion of capacity that has yet to be constructed ( “original LD”), while the other represents the data area that has completed data reorganization/reconstruction (“ghost LD”). When a host request is received during a reconstruction, Megaraid firmware determines which of these two internal LDs the request falls within and accordingly assigns the request to the either the original LD or the ghost LD. Requests straddling both LDs are deferred and rescheduled until the reconstruction point has advanced beyond the straddling host request.

We found that immediately after a reconstruction completes (with Diskerciser is running), Megaraid firmware is processing a certain number of queued host commands that have been assigned to the ghost LD. These commands were received just before the reconstruction completed, were assigned to the ghost LD, then deferred for later execution pending completion of the active reconstruction cycle (a cycle represents the reconstruction of a given set of rows). If the reconstructing cycle happens to be the final cycle (last set of rows), the reconstruction has completed. This triggers the removal of the ghost LD, the reconfiguration of the original LD to match the reconstructed parameters, and the reconfiguration of the entire Megaraid cache to reflect the new, finalized LD configuration. After these steps have completed, Megaraid resumes processing of the host commands that were queued pending the completion of the active (final) reconstruction cycle. The problem occurs when Megaraid processes the queued requests that were assigned to the ghost LD; the ghost LD at that this point no longer exists due to the completion of the reconstruction. These requests are internally allowed to execute on the ghost LD nonetheless; they complete without error because the ghost LD structure in memory is still valid, albeit abandoned. The corruption occurs because the cache buffers utilized in processing these requests will be assigned to the ghost LD, creating potential cache aliases to the original LD; if there is a mix of read and write commands, data in these cache line aliases may become stale relative to data updated on the disk for the write commands.

Fix: Upon completion of a reconstruction, we move any commands pending for the ghost LD queue onto the original LD queue.
State: Completed
Change Set Files: 0
References:   LSID100092957(DFCT)    
FW_SAS UCM TASKS
Task ID: LSID100081880
Headline: FW_SAS Release Version: 1.11.92-0397
Description: FW_SAS Release Version: 1.11.92-0397
State: Open
Change Set Files: 0
References:  
FW_SAS UCM TASKS
Task ID: LSID100092922
Headline: FW_SAS Release Version: 1.11.82-0396
Description: FW_SAS Release Version: 1.11.82-0396
State: Open
Change Set Files: 0
References:  
FW_SAS UCM TASKS
Task ID: LSID100092962
Headline: update maintenance version in version.c
Description: update maintenance version in version.c
State: Open
Change Set Files: 0
References: